Submitted by: Hima Mohandas, Shahina Hayat, Robert Jordan, Shawn Kovacs (Group 5)
Gold has an allure and a price tag, but some would say, it has no real intrinsic value. Something about the glint and sparkle of gold has always been appealing to humans. Gold is one of only two metals that humans have used as currency. And as currency, humans invest in gold as a safe investment to hold value and some seek to make profit by investing in gold at a low price and selling at a higher price. Therefore, time series forecasting is an important business application of forecasting the near future possibilities. This project is to analyze the historical prices of gold to predict the prices of gold in near future using Yahoo Finance. We intend to implement various models and the performances will be compared.
Some assumptions that were made given the dataset: Data was collected in a non-biased manner. It is possible to predict values based on historical data. The data are stationary. A stationary process has the property that the mean, variance and autocorrelation structure do not change over time. Or at least, the basic assumption is held that averaging and smoothing models is that the time series is locally stationary with a slowly varying mean. Hence, we take a moving (local) average to estimate the current value of the mean and then use that as the forecast for the near future or for very short-term forecasting. OLS regressions with time series data. The assumptions for unbiasedness of beta change; now we only require: (i) Linearity in parameters (ii) No perfect collinearity (iii) Zero conditional mean assumption
The data is downloaded from Yahoo finance. It contains historical data for gold prices from January 1, 2008 till date.
# LinearRegression is a machine learning library for linear regression
from sklearn.linear_model import LinearRegression
# pandas and numpy are used for data manipulation
import pandas as pd
import numpy as np
from datetime import date
import seaborn as sns
# matplotlib and seaborn are used for plotting graphs
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-darkgrid')
# yahoo finance is used to fetch data
import yfinance as yf
# Read data
START = '2008-01-01'
TODAYS_DATE = date.today()
df = yf.download('GLD', START, TODAYS_DATE)
df.shape
[*********************100%***********************] 1 of 1 completed
(3414, 6)
df = df.reset_index()
df
| Date | Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|---|
| 0 | 2008-01-02 | 83.559998 | 85.139999 | 83.440002 | 84.860001 | 84.860001 | 12291100 |
| 1 | 2008-01-03 | 84.870003 | 85.940002 | 84.599998 | 85.570000 | 85.570000 | 9553900 |
| 2 | 2008-01-04 | 85.339996 | 85.550003 | 84.430000 | 85.129997 | 85.129997 | 8402200 |
| 3 | 2008-01-07 | 85.239998 | 85.260002 | 84.570000 | 84.769997 | 84.769997 | 6944300 |
| 4 | 2008-01-08 | 86.279999 | 87.129997 | 86.160004 | 86.779999 | 86.779999 | 9567900 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 3409 | 2021-07-19 | 169.509995 | 169.910004 | 168.889999 | 169.610001 | 169.610001 | 6707400 |
| 3410 | 2021-07-20 | 170.509995 | 170.800003 | 168.919998 | 169.389999 | 169.389999 | 6540100 |
| 3411 | 2021-07-21 | 168.330002 | 169.000000 | 168.139999 | 168.759995 | 168.759995 | 4622600 |
| 3412 | 2021-07-22 | 168.490005 | 169.190002 | 168.059998 | 169.089996 | 169.089996 | 4793600 |
| 3413 | 2021-07-23 | 168.500000 | 168.880005 | 167.949997 | 168.559998 | 168.559998 | 5831000 |
3414 rows × 7 columns
df.isnull().sum()
Date 0 Open 0 High 0 Low 0 Close 0 Adj Close 0 Volume 0 dtype: int64
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3414 entries, 0 to 3413 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 3414 non-null datetime64[ns] 1 Open 3414 non-null float64 2 High 3414 non-null float64 3 Low 3414 non-null float64 4 Close 3414 non-null float64 5 Adj Close 3414 non-null float64 6 Volume 3414 non-null int64 dtypes: datetime64[ns](1), float64(5), int64(1) memory usage: 186.8 KB
df.describe()
| Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|
| count | 3414.000000 | 3414.000000 | 3414.000000 | 3414.000000 | 3414.000000 | 3.414000e+03 |
| mean | 128.154359 | 128.783975 | 127.470006 | 128.146406 | 128.146406 | 1.099452e+07 |
| std | 25.430152 | 25.483592 | 25.317432 | 25.420547 | 25.420547 | 7.173965e+06 |
| min | 69.300003 | 71.889999 | 66.000000 | 70.000000 | 70.000000 | 1.501600e+06 |
| 25% | 113.855000 | 114.335001 | 113.395000 | 113.814999 | 113.814999 | 6.531100e+06 |
| 50% | 122.919998 | 123.330002 | 122.500000 | 122.860001 | 122.860001 | 9.131250e+06 |
| 75% | 146.945004 | 147.580002 | 146.167500 | 146.777496 | 146.777496 | 1.322372e+07 |
| max | 193.740005 | 194.449997 | 192.520004 | 193.889999 | 193.889999 | 9.380420e+07 |
print("Dataframe contains GLD Data from from " + str(df['Date'].min()) + " to " + str(df['Date'].max()))
Dataframe contains GLD Data from from 2008-01-02 00:00:00 to 2021-07-23 00:00:00
df.head(20)
| Date | Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|---|
| 0 | 2008-01-02 | 83.559998 | 85.139999 | 83.440002 | 84.860001 | 84.860001 | 12291100 |
| 1 | 2008-01-03 | 84.870003 | 85.940002 | 84.599998 | 85.570000 | 85.570000 | 9553900 |
| 2 | 2008-01-04 | 85.339996 | 85.550003 | 84.430000 | 85.129997 | 85.129997 | 8402200 |
| 3 | 2008-01-07 | 85.239998 | 85.260002 | 84.570000 | 84.769997 | 84.769997 | 6944300 |
| 4 | 2008-01-08 | 86.279999 | 87.129997 | 86.160004 | 86.779999 | 86.779999 | 9567900 |
| 5 | 2008-01-09 | 86.559998 | 87.199997 | 86.300003 | 86.550003 | 86.550003 | 10080200 |
| 6 | 2008-01-10 | 86.419998 | 88.459999 | 86.410004 | 88.250000 | 88.250000 | 12916300 |
| 7 | 2008-01-11 | 88.040001 | 88.760002 | 87.849998 | 88.580002 | 88.580002 | 6978800 |
| 8 | 2008-01-14 | 89.449997 | 89.940002 | 89.000000 | 89.540001 | 89.540001 | 10085000 |
| 9 | 2008-01-15 | 89.599998 | 90.349998 | 87.910004 | 87.989998 | 87.989998 | 23853900 |
| 10 | 2008-01-16 | 88.169998 | 88.660004 | 86.320000 | 86.699997 | 86.699997 | 26919000 |
| 11 | 2008-01-17 | 87.500000 | 87.980003 | 86.470001 | 86.500000 | 86.500000 | 13592300 |
| 12 | 2008-01-18 | 87.169998 | 87.459999 | 86.510002 | 87.419998 | 87.419998 | 9094500 |
| 13 | 2008-01-22 | 86.139999 | 88.440002 | 85.769997 | 88.169998 | 88.169998 | 20679600 |
| 14 | 2008-01-23 | 87.160004 | 88.669998 | 86.730003 | 87.889999 | 87.889999 | 14282000 |
| 15 | 2008-01-24 | 89.720001 | 90.250000 | 89.129997 | 90.080002 | 90.080002 | 10627200 |
| 16 | 2008-01-25 | 90.930000 | 91.080002 | 89.500000 | 90.300003 | 90.300003 | 9703900 |
| 17 | 2008-01-28 | 90.959999 | 91.889999 | 90.750000 | 91.750000 | 91.750000 | 8533200 |
| 18 | 2008-01-29 | 91.360001 | 91.720001 | 90.809998 | 91.150002 | 91.150002 | 9091600 |
| 19 | 2008-01-30 | 90.709999 | 92.580002 | 90.449997 | 92.059998 | 92.059998 | 14378100 |
# Draw histogram of each column
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams["figure.figsize"] = (20,8)
hist = df[["Open","High","Low","Close", "Adj Close", 'Volume']].hist(bins=50)
correlation = df.corr()
import seaborn as sns
# constructing a heatmap to understand the correlatiom
plt.figure(figsize = (8,8))
sns.heatmap(correlation, cbar=True, square=True, fmt='.1f',annot=True, annot_kws={'size':8}, cmap='Blues')
<AxesSubplot:>
# correlation values
print(correlation)
Open High Low Close Adj Close Volume Open 1.000000 0.999656 0.999539 0.999283 0.999283 0.023178 High 0.999656 1.000000 0.999368 0.999623 0.999623 0.032707 Low 0.999539 0.999368 1.000000 0.999664 0.999664 0.007415 Close 0.999283 0.999623 0.999664 1.000000 1.000000 0.018593 Adj Close 0.999283 0.999623 0.999664 1.000000 1.000000 0.018593 Volume 0.023178 0.032707 0.007415 0.018593 0.018593 1.000000
# checking the distribution of the Open Price
sns.distplot(df['Open'],color='green')
/Users/himamohandas/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
<AxesSubplot:xlabel='Open', ylabel='Density'>
X = df.drop(['Date','Open'],axis=1)
Y = df['Open']
print(X)
High Low Close Adj Close Volume 0 85.139999 83.440002 84.860001 84.860001 12291100 1 85.940002 84.599998 85.570000 85.570000 9553900 2 85.550003 84.430000 85.129997 85.129997 8402200 3 85.260002 84.570000 84.769997 84.769997 6944300 4 87.129997 86.160004 86.779999 86.779999 9567900 ... ... ... ... ... ... 3409 169.910004 168.889999 169.610001 169.610001 6707400 3410 170.800003 168.919998 169.389999 169.389999 6540100 3411 169.000000 168.139999 168.759995 168.759995 4622600 3412 169.190002 168.059998 169.089996 169.089996 4793600 3413 168.880005 167.949997 168.559998 168.559998 5831000 [3414 rows x 5 columns]
print(Y)
0 83.559998
1 84.870003
2 85.339996
3 85.239998
4 86.279999
...
3409 169.509995
3410 170.509995
3411 168.330002
3412 168.490005
3413 168.500000
Name: Open, Length: 3414, dtype: float64
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn import metrics
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state=2)
regressor = RandomForestRegressor(n_estimators=100)
# training the model
regressor.fit(X_train,Y_train)
RandomForestRegressor()
# prediction on Test Data
test_data_prediction = regressor.predict(X_test)
print(test_data_prediction)
[ 90.44370087 126.43359924 161.76490189 167.80959946 173.24300293 125.9772007 144.25159988 126.75770027 178.93520065 173.36780258 147.31439865 173.36690247 119.35239952 135.06690048 130.24760124 118.8940992 120.41779892 127.68200035 113.00390022 88.06409958 162.4246019 160.98930023 113.79610092 146.41199905 88.07989937 174.80939896 116.76110046 155.23660202 118.1082003 185.26119705 123.87949982 143.02099976 123.42940002 164.20879868 123.30630127 115.55229919 170.83900055 109.80050018 120.17820015 114.75159973 159.5366008 134.26680023 156.67499863 115.86390038 113.83789986 133.76580078 138.51260071 127.11140106 172.72929993 140.47369797 123.77949974 176.0584996 121.48379967 134.94810074 119.64250107 126.27850021 112.88979919 179.03370163 161.41000214 115.44740005 91.33369972 116.66629929 135.5257016 127.68289856 104.18519974 122.74660011 91.3926017 114.89310036 91.12810097 167.56509918 116.34840118 126.49599854 118.7034008 111.77919907 111.33479958 121.58229927 85.65230186 118.5473996 119.74179993 102.31550163 161.91030121 169.18480209 153.54039902 90.54270134 127.07399948 132.36179993 122.98940025 157.5118013 86.96929985 128.61469788 142.05760147 71.31409981 115.79619972 80.8728978 125.45719894 142.26859985 122.09689995 135.71570206 168.4548999 93.14840019 182.00230103 142.02510117 124.09209938 123.91929916 167.05429916 124.94159904 112.42029984 75.75139915 115.64000053 125.05670013 124.27290138 122.52439903 88.69449921 81.12859932 117.38149963 157.45530106 169.4941008 169.23360168 109.67020035 85.56440117 108.13859985 79.57769943 116.86130142 165.91279938 89.14469986 122.04929909 94.47479897 148.73600159 91.14100052 124.55310028 116.10960007 167.08209671 91.56970032 150.72560089 127.42860161 138.38999924 116.4134993 146.32990051 104.51289841 182.49809937 126.10339844 118.95229927 157.31959656 114.9221994 116.34419998 160.52390182 121.57860168 183.79040237 160.3284024 107.05599937 118.34479996 119.65839851 111.07460022 153.11310135 173.90349976 141.25080078 122.08809921 106.93730026 115.68680077 181.12080017 126.35770012 119.9809996 91.02870125 146.66950134 114.29139893 180.47270233 86.56960136 157.54949829 111.02679909 77.31289986 117.21880043 85.46740105 183.65740036 122.7662001 125.99559967 98.05019859 111.08350014 115.54229836 112.00409882 162.2473027 119.84840057 121.05219925 140.38000061 103.21659874 169.03330078 124.94270042 114.35869904 126.87279808 120.69660004 160.54470123 119.79170021 106.85239906 132.54599792 119.86649879 111.14169937 161.36440262 140.69489914 122.56189926 118.65219917 151.43539948 173.66950043 87.31249977 105.79040176 90.96680038 118.61549995 122.79809944 87.48499985 177.87850113 116.4012011 114.79330109 125.27189919 87.43570038 178.26139679 160.2851004 160.50040375 128.19230057 87.07470055 124.14560165 111.55110001 165.27610031 155.34569778 127.06189896 173.02460098 158.09060074 121.68690086 89.61319977 173.16130157 87.64450096 161.8419017 161.32680206 156.06939957 92.71649826 115.73420029 152.16589859 145.58460159 115.92249977 80.55849869 120.71600014 107.72840034 72.23739883 152.25339828 106.92889877 126.19129959 154.44880188 116.2310009 142.83869873 78.19599892 121.10889877 121.22450058 137.28750259 127.05809937 124.30700142 167.99920181 124.62360039 142.36039719 122.77239906 132.69999817 111.17149933 105.04959984 132.64089844 156.14089905 150.22409882 91.01660057 162.8891983 115.235 124.39330093 168.92029938 128.72179825 107.80220032 121.7512999 126.10680092 124.40690056 92.22899986 117.35659912 109.75640114 125.74359879 115.08580063 123.06850029 87.04979973 119.47460083 124.68539932 126.59139893 112.50790031 83.44940033 125.95710037 127.63960014 174.38820236 190.55470367 155.50060059 122.44899994 120.05619957 113.46279907 119.67050163 84.26520027 103.41309944 92.10430084 121.75810005 115.92100067 167.00530136 121.01989929 115.59620026 159.04319946 114.18169914 81.67749939 158.91289993 127.33400139 108.3614994 118.14260094 116.05970093 138.2930014 161.07600021 119.59379967 161.89390015 122.87500053 85.33150024 128.03049927 162.34620148 108.16799973 118.97679985 117.9970002 95.50420036 184.52679855 174.77410202 117.0015992 118.06469978 96.99439964 111.31980003 149.94609756 123.29160103 93.39009964 115.88020111 123.81479912 122.00079956 107.27689842 114.67690086 121.94499985 115.85919991 120.01859917 125.15929939 122.38939957 111.66560036 154.37330185 120.99000031 144.32190094 166.46519958 167.327901 89.44629959 122.5548008 122.33320023 125.31489883 159.55850174 123.30850014 87.58610123 155.10120193 120.09279854 125.25550064 127.76970116 140.32190002 180.64529984 116.28909912 90.52090103 167.04319839 122.03830009 152.80160034 122.3065995 156.17809799 124.72969971 140.94 127.85350014 113.33210007 86.48179985 178.54509842 118.0558007 176.29429596 159.32130081 137.18940155 133.11399963 92.11770012 142.02910004 91.07230034 133.11060074 120.50879944 110.95669899 110.93669868 158.92759933 160.25510101 148.73880066 126.59339897 180.5682045 116.72569954 166.90019882 91.89629929 116.50759956 182.18150162 90.97120033 88.401399 88.50089905 162.91060257 117.25250183 122.73440063 183.17569717 138.47220093 121.17200073 121.3398999 109.63420181 170.91899734 154.53079849 84.09569977 139.78320129 87.5190004 119.12749992 125.35959999 161.53480042 91.05090042 111.70250061 119.10649948 117.78709969 123.63279869 117.82879951 152.00089828 110.07609955 176.34639862 102.61360031 162.16500122 107.01269859 178.57340149 118.42900055 123.13949982 127.42109993 103.70899811 160.72770187 135.38350311 115.6751992 119.46909882 127.61419983 91.5298008 96.24080009 127.07519897 112.97760178 151.86079727 128.30180107 118.78990013 115.36790077 121.18959923 107.83779953 155.55410049 121.61329987 113.07129959 104.74949852 131.66579727 90.50270096 133.23289871 137.92479874 171.55160004 172.08279984 117.47169907 126.91900024 136.04090103 127.94690117 118.4980011 120.03399918 105.10710083 115.48940079 138.27410034 134.74889862 174.35589966 174.84340225 108.23890007 107.06149879 119.63449913 126.37469963 127.23790039 105.69710152 183.01579956 161.73200027 86.10620178 117.04450081 123.77449905 159.40919785 115.54820107 121.56450111 119.03650017 114.05120056 115.73430031 175.03210129 113.82609978 114.62010101 116.35380127 176.1630986 161.16950409 107.76370018 177.08599808 158.18000046 72.17509926 91.83879944 121.21539909 148.95459732 115.87429916 102.45430229 128.90379669 85.22500206 77.18320007 160.48120163 166.72319962 123.73290031 117.55070122 109.9382 88.98919922 178.68870132 170.43229904 125.15819901 131.79130203 158.08550064 114.49520004 169.41739838 168.88779861 103.13919991 113.1941008 168.86070206 109.88239899 163.33249908 115.2189994 178.82089951 82.45229927 142.59530029 167.7625 122.54239983 90.3777002 169.45409988 160.76190262 148.25730209 152.31879837 111.32049911 118.11219902 170.64639969 73.34200111 173.78900177 125.08290085 171.84740128 121.69229904 110.92530006 117.88499931 114.70859993 113.79760025 127.9448999 166.93329758 134.06680252 122.1231987 114.6464003 116.51140114 127.5137999 125.03339912 126.54699982 129.01300125 84.57279922 160.59410278 138.78600281 119.34519966 122.76139946 85.67640182 142.35079727 110.95579918 174.01990036 126.75719963 150.50000076 116.43610001 125.94709984 162.71940033 119.08070061 114.83700119 123.66090096 189.61510178 123.02160065 125.74389931 112.30450104 108.3550988 122.54980019 109.44429977 122.57250084 112.99269974 155.81670059 115.06479874 118.26290016 156.7153981 112.4972995 131.49079819 111.40479973 126.97730141 111.30989975 117.45500038 155.42170044 112.41399879 114.35719887 81.05369759 153.88870026 113.7645002 178.71769867 118.51259987 119.48010033 182.86800278 122.46049889 124.3816008 118.6504998 150.44880142 181.82610046 119.76080048 166.76049896 115.63929932 141.37590103 155.39430206 85.61690079 136.91670227 127.84500092 127.15920021 114.26209908 121.34890022 122.53869995 111.96109856 160.5978038 91.30180054 126.44100044 137.59050217 127.61669998 155.14140015 112.23609993 91.41060104 120.96109901 130.34650223 111.53270012 157.99739929 181.87570023 124.80890045 138.77200104 168.45960007 126.08059937 112.94590073 70.50520088 152.00879837 184.57279526 103.78309799 108.37760002 123.03599991 122.52049858 172.05400253 86.35800072 120.70230019 119.82789986 139.4417981 166.42559723 129.79500061 114.20469864 113.27669876 123.00450035 116.65530052 153.71920074 123.88940002 80.72479797 125.41320061 190.48150345 178.37579727 177.21179886 118.88119934 146.59979919 86.95689972 134.11259888 88.39879936 87.38770012 160.80539993 116.32910103 124.18250214 141.65520309 129.12050339 139.90840149 88.60939941 178.75269928 125.53010025 125.7752993 162.64410141 171.79369995 161.12369949 143.15540329 178.65500168 119.97969879]
# R squared error
error_score = metrics.r2_score(Y_test, test_data_prediction)
print("R squared error : ", error_score)
R squared error : 0.9995393946423325
Y_test = list(Y_test)
plt.plot(Y_test, color='blue', label = 'Actual Value')
plt.plot(test_data_prediction, color='green', label='Predicted Value')
plt.title('Actual Price vs Predicted Price')
plt.xlabel('Number of values')
plt.ylabel('GLD Price')
plt.legend()
plt.show()
Actual and Predicted price are very close.
df2 = yf.download('GLD', START, TODAYS_DATE)
df2.shape
[*********************100%***********************] 1 of 1 completed
(3414, 6)
from statsmodels.tsa.stattools import adfuller
adfuller_result = adfuller(df2['Open'])
print('ADF Statistic: ', adfuller_result[0])
print('p-value: ', adfuller_result[1])
ADF Statistic: -1.721416651678857 p-value: 0.4201131212705324
The p-value is higher than 0.05 and this indicates that the data is not stationary.
from math import sqrt
from sklearn.metrics import mean_squared_error, make_scorer, mean_absolute_error
# Analysis imports
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from pmdarima.arima import auto_arima
from prophet import Prophet
import warnings
warnings.filterwarnings('ignore') #this would remove any deprecated warning
# Diff Method
df2_diff = df2.diff().dropna()
diff_adfuller_result = adfuller(df2_diff['Open'])
print('Difference Method ADF Statistic: ', diff_adfuller_result[0])
print('Difference Method p-value: ', diff_adfuller_result[1])
Difference Method ADF Statistic: -19.47628301710241 Difference Method p-value: 0.0
# Diff Twice Method
df2_difftwice = df2.diff().diff().dropna()
difftwice_adfuller_result = adfuller(df2_difftwice['Open'])
print('Difference Method ADF Statistic: ', difftwice_adfuller_result[0])
print('Difference Method p-value: ', difftwice_adfuller_result[1])
Difference Method ADF Statistic: -19.69071072739455 Difference Method p-value: 0.0
#Square Root Method
df2_sqrt = np.sqrt(df2).dropna()
sqrt_adfuller_result = adfuller(df2_sqrt['Open'])
print('Square Root Method ADF Statistic: ', sqrt_adfuller_result[0])
print('Square Root Methodp-value: ', sqrt_adfuller_result[1])
Square Root Method ADF Statistic: -1.8167924664616 Square Root Methodp-value: 0.3721571442426048
We discard the Square Root method since it gave p-value greater than 0.05. We decide to proceed with differencing once.
# Plot the time series before transformation
plt.figure(figsize=(16,8))
sns.lineplot(x=df2.index, y='Open', data=df2, linewidth=1.5, label='Before transformation').set_title('Time Series - Before Transformation')
plt.show()
# Plot the time series after transformation
plt.figure(figsize=(16,8))
sns.lineplot(x=df2_diff.index, y='Open', data=df2_diff, label='After transformation', color='green').set_title('Stationary Time Series - Diff Once Transformation')
plt.show()
# Plot ACF and PACF with stationary data using data that was differenced twice
fig_diff, (ax1, ax2) = plt.subplots(2,1, figsize=(12,8))
# Plot ACF of data_difftwice
plot_acf(df2_diff['Open'], lags=15, zero=False, ax=ax1, title='Autocorrelation Diff Once Data')
# Plot PACF of data_difftwice
plot_pacf(df2_diff['Open'], lags=15, zero=False, ax=ax2, title='Partial Autocorrelation Diff Once Data')
plt.show()
# Plot ACF and PACF with non-stationary data just to see how different they are compared to the stationary data
fig_data_gld, (ax1, ax2) = plt.subplots(2,1, figsize=(12,8))
# Plot ACF of data_difftwice
plot_acf(df2['Open'], lags=10, zero=False, ax=ax1, title='Autocorrelation Non-Stationary Data')
# Plot PACF of data_difftwice
plot_pacf(df2['Open'], lags=10, zero=False, ax=ax2, title='Partial Autocorrelation Non-Stationary Data')
plt.show()
# Search for the ideal model order
order_aic_bic = []
# Loop over AR order
for p in range(3):
# Loop over MA order
for q in range(3):
try:
# Fit model
model = SARIMAX(df2_diff['Open'], order=(p,0,q))
results = model.fit()
# Store the model order and the AIC/BIC values in order_aic_bic list
order_aic_bic.append((p, q, results.aic, results.bic))
#print(p, q, results.aic, results.bic)
except:
# Print AIC and BIC as None when fails
print(p, q, None, None)
# Make a dataframe of model order using AIC/BIC scores
aic_bic_df = pd.DataFrame(order_aic_bic, columns=['p', 'q', 'aic', 'bic'])
print(aic_bic_df)
p q aic bic 0 0 0 12099.466301 12105.601648 1 0 1 12100.847609 12113.118303 2 0 2 12102.843879 12121.249919 3 1 0 12100.846746 12113.117440 4 1 1 12102.847165 12121.253206 5 1 2 12104.844687 12129.386074 6 2 0 12102.846278 12121.252318 7 2 1 12104.845927 12129.387315 8 2 2 12086.977442 12117.654177
# Sort by AIC
print(aic_bic_df.sort_values('aic'))
p q aic bic 8 2 2 12086.977442 12117.654177 0 0 0 12099.466301 12105.601648 3 1 0 12100.846746 12113.117440 1 0 1 12100.847609 12113.118303 2 0 2 12102.843879 12121.249919 6 2 0 12102.846278 12121.252318 4 1 1 12102.847165 12121.253206 5 1 2 12104.844687 12129.386074 7 2 1 12104.845927 12129.387315
# Sort by BIC
print(aic_bic_df.sort_values('bic'))
p q aic bic 0 0 0 12099.466301 12105.601648 3 1 0 12100.846746 12113.117440 1 0 1 12100.847609 12113.118303 8 2 2 12086.977442 12117.654177 2 0 2 12102.843879 12121.249919 6 2 0 12102.846278 12121.252318 4 1 1 12102.847165 12121.253206 5 1 2 12104.844687 12129.386074 7 2 1 12104.845927 12129.387315
Both AIC and BIC selected p and q values of 0 and 1 respectively.
#split the data into test and train
train_data = df2.loc[:'2017']
test_data = df2.loc['2018':]
# Look at train data shape
train_data.shape
(2518, 6)
# Look at train data
train_data.tail()
| Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2017-12-22 | 120.669998 | 121.139999 | 120.570000 | 120.940002 | 120.940002 | 5791300 |
| 2017-12-26 | 121.550003 | 121.870003 | 121.510002 | 121.769997 | 121.769997 | 8224400 |
| 2017-12-27 | 122.000000 | 122.339996 | 121.879997 | 122.230003 | 122.230003 | 6232700 |
| 2017-12-28 | 122.820000 | 122.919998 | 122.559998 | 122.849998 | 122.849998 | 5732700 |
| 2017-12-29 | 123.699997 | 124.089996 | 123.459999 | 123.650002 | 123.650002 | 7852100 |
# Look at test data shape
test_data.shape
(896, 6)
# Look at test data
test_data.head()
| Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2018-01-02 | 124.660004 | 125.180000 | 124.389999 | 125.150002 | 125.150002 | 11762500 |
| 2018-01-03 | 125.050003 | 125.089996 | 124.099998 | 124.820000 | 124.820000 | 7904300 |
| 2018-01-04 | 124.889999 | 125.849998 | 124.739998 | 125.459999 | 125.459999 | 7329700 |
| 2018-01-05 | 124.930000 | 125.480003 | 124.830002 | 125.330002 | 125.330002 | 5739900 |
| 2018-01-08 | 125.199997 | 125.320000 | 124.900002 | 125.309998 | 125.309998 | 3566700 |
# Plot the test and train data
plt.figure(figsize=(16,8))
sns.lineplot(x=train_data.index, y='Open', data=train_data, linewidth=1.5, label='Training data').set_title('GLD Daily Opening Price from Train and Test Data')
sns.lineplot(x=test_data.index, y='Open', data=test_data, linewidth=1.5, label='Test data')
plt.show()
model = SARIMAX(df, order = (p,d,q))
p = number of autoregressive lags d = order of differencing q = number of moving average lags Based on previous findings, the order = (p,d,q) should be:
Difference = 2 p = 0 q = 1 When using ARIMA, the forecasted data is the actual forecated price, not the difference.
# Fit a model
model = SARIMAX(train_data['Open'], order=(0,2,1), trend= 'c')
results = model.fit()
# Make predictions for the last 365 days of the train data
# dynamic=False ensures we produce one-step ahead forecasts, forecasts at each point are generated using the full history up to that point
# start=-365, we want to start the prediction from one year back (365 days)
pred_365_traindata = results.get_prediction(start=-365, dynamic=False)
# Forecast mean for these 365 days
pred_mean_365_traindata = pred_365_traindata.predicted_mean
# Get confidence intervals of forecast
confidence_intervals = pred_365_traindata.conf_int()
# Select lower and upper confidence limits
lower_limits = confidence_intervals.loc[:,'lower Open']
upper_limits = confidence_intervals.loc[:,'upper Open']
print(pred_mean_365_traindata)
Date
2016-07-21 125.475789
2016-07-22 125.635838
2016-07-25 126.406163
2016-07-26 125.505714
2016-07-27 126.065942
...
2017-12-22 120.033534
2017-12-26 120.633746
2017-12-27 121.514075
2017-12-28 121.964225
2017-12-29 122.784526
Name: predicted_mean, Length: 365, dtype: float64
# Convert pred_mean_365_traindata series to a dataframe
# Inspect pred_mean_365_traindata_df
pred_mean_365_traindata_df = pred_mean_365_traindata.to_frame(name='forecasted_mean')
pred_mean_365_traindata_df.head()
| forecasted_mean | |
|---|---|
| Date | |
| 2016-07-21 | 125.475789 |
| 2016-07-22 | 125.635838 |
| 2016-07-25 | 126.406163 |
| 2016-07-26 | 125.505714 |
| 2016-07-27 | 126.065942 |
# Plot the original training data - Zoom in starting from 2014
plt.figure(figsize=(20,10))
sns.lineplot(x=train_data['2014-01-01 00:00:00':].index, y='Open', data=train_data['2014-01-01 00:00:00':], linewidth=4, label='observed').set_title('One-step ahead Forecast')
# Plot the mean predictions for the last 365 days of training data
sns.lineplot(x=pred_mean_365_traindata_df.index, y=pred_mean_365_traindata_df['forecasted_mean'], data=pred_mean_365_traindata_df, linewidth=1, label='forecast for 365 days', color='red')
# Shade the area between the confidence intervals
plt.fill_between(lower_limits.index, lower_limits, upper_limits, color='pink')
<matplotlib.collections.PolyCollection at 0x7fb1259ae670>
# Create the 4 diagnostic plots
results.plot_diagnostics()
plt.show()
print(results.summary())
SARIMAX Results
==============================================================================
Dep. Variable: Open No. Observations: 2518
Model: SARIMAX(0, 2, 1) Log Likelihood -4463.947
Date: Mon, 26 Jul 2021 AIC 8933.894
Time: 21:41:55 BIC 8951.385
Sample: 0 HQIC 8940.242
- 2518
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept -4.059e-05 5.02e-05 -0.809 0.419 -0.000 5.78e-05
ma.L1 -1.0000 0.047 -21.278 0.000 -1.092 -0.908
sigma2 2.0292 0.101 19.993 0.000 1.830 2.228
===================================================================================
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 5247.00
Prob(Q): 0.97 Prob(JB): 0.00
Heteroskedasticity (H): 0.53 Skew: -0.61
Prob(H) (two-sided): 0.00 Kurtosis: 9.97
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
# Calculate Mean Absolute Error Between the Predicted Open Prices and the Real Open Prices
residuals = results.resid
mae = np.mean(np.abs(residuals))
print('The Mean Absolute Error of our forecasts is {}'.format(round(mae, 2)))
# Calculate Mean Square Error for periods 2016-01-01 to 2016-12-31 (last 365 days of training data)
pred_mean_365_traindata = pred_365_traindata.predicted_mean
real_values = train_data['2016-01-01':'2016-12-31']['Open']
mse = ((pred_mean_365_traindata - real_values) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))
# Calculate Root Mean Squared Error
rmse = np.sqrt(mse)
print('The Root Mean Squared Error of our forecasts is {}'.format(round(rmse, 2)))
The Mean Absolute Error of our forecasts is 1.06 The Mean Squared Error of our forecasts is 0.99 The Root Mean Squared Error of our forecasts is 0.99
The analysis above shows that the residuals are correlated and they are not normally distrbuted. This means there is data that the model didn't capture. We need to refit the model. We may not be able to use this model to make forecasts.
# Use seasonal_decompose to check for seasonal time series
# Additive = Level + Trend + Seasonality + Noise
# Used freq = 365 to represent 365 days in a year
# Had to run this line of code because encountered this error - TypeError: float() argument must be a string or a number, not Period
# This is the solution - Re-registering Pandas converters with"
pd.plotting.register_matplotlib_converters()
decomp_results = seasonal_decompose(df2['Open'], model='additive', freq=365)
# Visualize the data using time-series decomposition
decomp_results.plot()
plt.show()
The plots above shows that there is some obvious seasonality.
# In order to use Auto Arima, data needs to be univariate series. We only need 1 column.
arima_data = df2.drop(columns=['High', 'Low', 'Close', 'Adj Close', 'Volume'])
arima_data.head()
| Open | |
|---|---|
| Date | |
| 2008-01-02 | 83.559998 |
| 2008-01-03 | 84.870003 |
| 2008-01-04 | 85.339996 |
| 2008-01-07 | 85.239998 |
| 2008-01-08 | 86.279999 |
# m = number of observations per seasonal cycle, 7-daily, 12-monthly, 52-weekly
# Seasonal = True by default
# Stationary = False by default
results = auto_arima(arima_data,
seasonal=True,
start_p = 1,
start_q = 1,
#max_p = 3,
#max_q = 3,
start_P=1,
start_Q=1,
max_P=3,
max_Q=3,
m=7,
information_criterion='aic',
trace=True,
error_action='ignore',
stepwise=True)
Performing stepwise search to minimize aic ARIMA(1,1,1)(1,0,1)[7] intercept : AIC=12107.626, Time=1.27 sec ARIMA(0,1,0)(0,0,0)[7] intercept : AIC=12100.423, Time=0.04 sec ARIMA(1,1,0)(1,0,0)[7] intercept : AIC=12103.622, Time=0.33 sec ARIMA(0,1,1)(0,0,1)[7] intercept : AIC=12103.622, Time=0.74 sec ARIMA(0,1,0)(0,0,0)[7] : AIC=12099.466, Time=0.04 sec ARIMA(0,1,0)(1,0,0)[7] intercept : AIC=12102.203, Time=0.16 sec ARIMA(0,1,0)(0,0,1)[7] intercept : AIC=12102.202, Time=0.20 sec ARIMA(0,1,0)(1,0,1)[7] intercept : AIC=12104.205, Time=0.41 sec ARIMA(1,1,0)(0,0,0)[7] intercept : AIC=12101.831, Time=0.09 sec ARIMA(0,1,1)(0,0,0)[7] intercept : AIC=12101.831, Time=0.37 sec ARIMA(1,1,1)(0,0,0)[7] intercept : AIC=12103.833, Time=0.49 sec Best model: ARIMA(0,1,0)(0,0,0)[7] Total fit time: 4.169 seconds
Based on the lowest AIC score, the Best Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 7). Auto Arima didn't detect any seasonality and suggested the differencing is ONLY ONCE.
print(results.summary())
SARIMAX Results
==============================================================================
Dep. Variable: y No. Observations: 3414
Model: SARIMAX(0, 1, 0) Log Likelihood -6048.733
Date: Mon, 26 Jul 2021 AIC 12099.466
Time: 21:41:59 BIC 12105.602
Sample: 0 HQIC 12101.659
- 3414
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
sigma2 2.0272 0.022 92.952 0.000 1.984 2.070
===================================================================================
Ljung-Box (L1) (Q): 0.59 Jarque-Bera (JB): 9706.31
Prob(Q): 0.44 Prob(JB): 0.00
Heteroskedasticity (H): 0.63 Skew: -0.60
Prob(H) (two-sided): 0.00 Kurtosis: 11.17
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
results.plot_diagnostics()
plt.show()
# Based on auto_arima, the Best Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 7)
# Split data into training and test sets
auto_arima_train_data = arima_data.loc[:'2017']
auto_arima_test_data = arima_data.loc['2018':]
# Create a model using auto arima
auto_arima_model = SARIMAX(auto_arima_train_data,
seasonal=True,
order=(0,1,0),
seasonal_order=(0,0,0,7),
trend='c')
# Fit the model
auto_arima_results = auto_arima_model.fit()
# Make predictions for the last 365 days of the train data
# dynamic=False ensures we produce one-step ahead forecasts, forecasts at each point are generated using the full history up to that point
# start=-365, we want to start the prediction from one year back (365 days)
auto_arima_pred_365_traindata = auto_arima_results.get_prediction(start=-365, dynamic=False)
# Forecast mean for 365 days
auto_arima_pred_mean_365_traindata = auto_arima_pred_365_traindata.predicted_mean
# Get confidence intervals of forecast
auto_arima_confidence_intervals = auto_arima_pred_365_traindata.conf_int()
# Select lower and upper confidence limits
auto_arima_lower_limits = auto_arima_confidence_intervals.loc[:,'lower Open']
auto_arima_upper_limits = auto_arima_confidence_intervals.loc[:,'upper Open']
# Convert auto_arima_pred_mean_365_traindata series to a dataframe
# Inspect auto_arima_pred_mean_365_traindata_df
auto_arima_pred_mean_365_traindata_df = auto_arima_pred_mean_365_traindata.to_frame(name='forecasted_mean')
auto_arima_pred_mean_365_traindata_df.head()
| forecasted_mean | |
|---|---|
| Date | |
| 2016-07-21 | 125.515948 |
| 2016-07-22 | 125.675951 |
| 2016-07-25 | 126.445948 |
| 2016-07-26 | 125.545946 |
| 2016-07-27 | 126.105944 |
# Plot the origin training data - Zoom in starting from 2014
plt.figure(figsize=(20,10))
sns.lineplot(x=train_data['2014-01-01 00:00:00':].index, y='Open', data=train_data['2014-01-01 00:00:00':], linewidth=4, label='observed').set_title('One-step ahead Forecast')
# Plot the mean predictions
sns.lineplot(x=auto_arima_pred_mean_365_traindata_df.index, y=auto_arima_pred_mean_365_traindata_df['forecasted_mean'], data=auto_arima_pred_mean_365_traindata_df, linewidth=1, label='forecast for 365 days', color='red')
# Shade the area between the confidence intervals
plt.fill_between(auto_arima_lower_limits.index, auto_arima_lower_limits, auto_arima_upper_limits, color='pink')
<matplotlib.collections.PolyCollection at 0x7fb0a3e5ddc0>
# Create the 4 diagnostic plots
auto_arima_results.plot_diagnostics()
plt.show()
# Print the diagnostic summary results
print(auto_arima_results.summary())
# Calculate Mean Absolute Error Between the Predicted Open Prices and the Real Open Prices
auto_arima_residuals = auto_arima_results.resid
mae = np.mean(np.abs(auto_arima_residuals))
print('The Mean Absolute Error of our forecasts is {}'.format(round(mae, 2)))
# Calculate Mean Square Error for periods 2015-08-05 to 2016-12-30 (last 365 days of training data)
auto_arima_pred_mean_365_traindata = auto_arima_pred_365_traindata.predicted_mean
real_values = auto_arima_train_data['2016-01-02':'2016-12-31']['Open']
mse = ((auto_arima_pred_mean_365_traindata - real_values) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))
# Calculate Root Mean Squared Error
rmse = np.sqrt(mse)
print('The Root Mean Squared Error of our forecasts is {}'.format(round(rmse, 2)))
SARIMAX Results
==============================================================================
Dep. Variable: Open No. Observations: 2518
Model: SARIMAX(0, 1, 0) Log Likelihood -4461.713
Date: Mon, 26 Jul 2021 AIC 8927.426
Time: 21:42:00 BIC 8939.088
Sample: 0 HQIC 8931.658
- 2518
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept 0.0159 0.029 0.551 0.582 -0.041 0.073
sigma2 2.0287 0.028 73.329 0.000 1.974 2.083
===================================================================================
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 5122.18
Prob(Q): 0.97 Prob(JB): 0.00
Heteroskedasticity (H): 0.52 Skew: -0.59
Prob(H) (two-sided): 0.00 Kurtosis: 9.89
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
The Mean Absolute Error of our forecasts is 1.04
The Mean Squared Error of our forecasts is 1.0
The Root Mean Squared Error of our forecasts is 1.0
# Create a model using the optimal parameters found by the manual grid search: ARIMA(2, 1, 2)x(0, 0, 2, 7)7 - AIC:12298.183421032392
manual_arima_model = SARIMAX(auto_arima_train_data,
seasonal=True,
order=(0,1,2),
seasonal_order=(0,0,2,7),
trend='c')
# Fit the model
manual_arima_results = manual_arima_model.fit()
# Make predictions for the last 365 days of the train data
# dynamic=False ensures we produce one-step ahead forecasts, forecasts at each point are generated using the full history up to that point
# start=-365, we want to start the prediction from one year back (365 days)
manual_arima_pred_365_traindata = manual_arima_results.get_prediction(start=-365, dynamic=False)
# Forecast mean for 365 days
manual_arima_pred_mean_365_traindata = manual_arima_pred_365_traindata.predicted_mean
# Get confidence intervals of forecast
manual_arima_confidence_intervals = manual_arima_pred_365_traindata.conf_int()
# Select lower and upper confidence limits
manual_arima_lower_limits = manual_arima_confidence_intervals.loc[:,'lower Open']
manual_arima_upper_limits = manual_arima_confidence_intervals.loc[:,'upper Open']
# Convert manual_arima_pred_mean_365_traindata series to a dataframe
# Inspect manual_arima_pred_mean_365_traindata_df
manual_arima_pred_mean_365_traindata_df = manual_arima_pred_mean_365_traindata.to_frame(name='forecasted_mean')
manual_arima_pred_mean_365_traindata_df.tail()
| forecasted_mean | |
|---|---|
| Date | |
| 2017-12-22 | 120.064886 |
| 2017-12-26 | 120.667987 |
| 2017-12-27 | 121.551442 |
| 2017-12-28 | 122.010004 |
| 2017-12-29 | 122.836197 |
# Plot the origin training data - Zoom in starting from 2014
plt.figure(figsize=(20,10))
sns.lineplot(x=train_data['2014-01-01 00:00:00':].index, y='Open', data=train_data['2014-01-01 00:00:00':], linewidth=4, label='observed').set_title('One-step ahead Forecast Using Manual Grid Search')
# Plot the mean predictions
sns.lineplot(x=manual_arima_pred_mean_365_traindata_df.index, y=manual_arima_pred_mean_365_traindata_df['forecasted_mean'], data=manual_arima_pred_mean_365_traindata_df, linewidth=1, label='forecast for 365 days', color='red')
# Shade the area between the confidence intervals
plt.fill_between(manual_arima_lower_limits.index, manual_arima_lower_limits, manual_arima_upper_limits, color='pink')
<matplotlib.collections.PolyCollection at 0x7fb1010c24c0>
# Create the 4 diagnostic plots
manual_arima_results.plot_diagnostics()
plt.show()
# Print the diagnostic summary results
print(manual_arima_results.summary())
# Calculate Mean Absolute Error Between the Predicted Open Prices and the Real Open Prices
manual_arima_residuals = manual_arima_results.resid
mae = np.mean(np.abs(manual_arima_residuals))
print('The Mean Absolute Error of our forecasts is {}'.format(round(mae, 2)))
# Calculate Mean Square Error for periods 2015-08-05 to 2016-12-30 (last 365 days of training data)
manual_arima_pred_mean_365_traindata = manual_arima_pred_365_traindata.predicted_mean
real_values = auto_arima_train_data['2016-01-02':'2016-12-31']['Open']
mse = ((manual_arima_pred_mean_365_traindata - real_values) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))
# Calculate Root Mean Squared Error
rmse = np.sqrt(mse)
print('The Root Mean Squared Error of our forecasts is {}'.format(round(rmse, 2)))
SARIMAX Results
=========================================================================================
Dep. Variable: Open No. Observations: 2518
Model: SARIMAX(0, 1, 2)x(0, 0, 2, 7) Log Likelihood -4460.323
Date: Mon, 26 Jul 2021 AIC 8932.645
Time: 21:42:02 BIC 8967.630
Sample: 0 HQIC 8945.342
- 2518
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept 0.0158 0.028 0.560 0.575 -0.040 0.071
ma.L1 0.0010 0.014 0.075 0.941 -0.026 0.028
ma.L2 -0.0106 0.016 -0.663 0.508 -0.042 0.021
ma.S.L7 -0.0305 0.014 -2.113 0.035 -0.059 -0.002
ma.S.L14 -0.0078 0.016 -0.477 0.633 -0.040 0.024
sigma2 2.0264 0.028 72.152 0.000 1.971 2.081
===================================================================================
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 5121.18
Prob(Q): 0.99 Prob(JB): 0.00
Heteroskedasticity (H): 0.53 Skew: -0.59
Prob(H) (two-sided): 0.00 Kurtosis: 9.89
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
The Mean Absolute Error of our forecasts is 1.04
The Mean Squared Error of our forecasts is 1.0
The Root Mean Squared Error of our forecasts is 1.0
# Forecasting out of the sample
auto_arima_forecast = auto_arima_results.get_forecast(steps=len(test_data))
# Forecast mean
auto_arima_mean_forecast = auto_arima_forecast.predicted_mean
# Get confidence intervals of forecast
# Assign it the same index at test data
auto_arima_forecasted_confidence_intervals = auto_arima_forecast.conf_int()
auto_arima_forecasted_confidence_intervals.index = test_data.index #need to do this in order to plot
# Select lower and upper confidence limits
auto_arima_forecasted_lower_limits = auto_arima_forecasted_confidence_intervals.loc[:,'lower Open']
auto_arima_forecasted_upper_limits = auto_arima_forecasted_confidence_intervals.loc[:,'upper Open']
# Convert auto_arima_mean_forecast to a dataframe
# Inspect auto_arima_mean_forecast
auto_arima_mean_forecast_df = auto_arima_mean_forecast.to_frame(name='forecasted_mean')
auto_arima_mean_forecast_df.index = test_data.index
auto_arima_mean_forecast_df.head()
| forecasted_mean | |
|---|---|
| Date | |
| 2018-01-02 | 123.715945 |
| 2018-01-03 | 123.731892 |
| 2018-01-04 | 123.747840 |
| 2018-01-05 | 123.763787 |
| 2018-01-08 | 123.779735 |
future_df = pd.bdate_range(start='1/1/2022', end='31/12/2022')
future_df
DatetimeIndex(['2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06',
'2022-01-07', '2022-01-10', '2022-01-11', '2022-01-12',
'2022-01-13', '2022-01-14',
...
'2022-12-19', '2022-12-20', '2022-12-21', '2022-12-22',
'2022-12-23', '2022-12-26', '2022-12-27', '2022-12-28',
'2022-12-29', '2022-12-30'],
dtype='datetime64[ns]', length=260, freq='B')
## predict prices for 2022
auto_arima_forecast_future = auto_arima_results.get_forecast(steps=len(future_df))
auto_arima_mean_forecast_future = auto_arima_forecast_future.predicted_mean
auto_arima_mean_forecast_future_df = auto_arima_mean_forecast_future.to_frame(name='forecasted_mean_future')
auto_arima_mean_forecast_future_df.index = future_df
auto_arima_mean_forecast_future_df.head()
| forecasted_mean_future | |
|---|---|
| 2022-01-03 | 123.715945 |
| 2022-01-04 | 123.731892 |
| 2022-01-05 | 123.747840 |
| 2022-01-06 | 123.763787 |
| 2022-01-07 | 123.779735 |
# Plot the forecasted data set against the test data
plt.figure(figsize=(20,10))
# Plot the train data
sns.lineplot(x=train_data.index, y='Open', data=train_data, linewidth=4, label='observed train data').set_title('Test Data vs Forecasted Data From 2017 to 2019')
# Plot the test data
sns.lineplot(x=test_data.index, y='Open', data=test_data, linewidth=4, label='observed test data')
# Plot the forecast data
sns.lineplot(x=auto_arima_mean_forecast_df.index, y=auto_arima_mean_forecast_df['forecasted_mean'], data=auto_arima_mean_forecast_df, linewidth=1, label='forecast', color='red')
# Shade the area between the confidence intervals
plt.fill_between(auto_arima_forecasted_lower_limits.index, auto_arima_forecasted_lower_limits, auto_arima_forecasted_upper_limits, color='pink')
<matplotlib.collections.PolyCollection at 0x7fb110650040>
# Calculate MAE, MSE, RMSE
real_test_values = test_data['Open']
# Calculate MAE, MSE, RMSE
print('MAE: {}'.format(mean_absolute_error(real_test_values, auto_arima_mean_forecast)))
print('MSE: {}'.format(mean_squared_error(real_test_values, auto_arima_mean_forecast)))
print('RMSE: {}'.format(np.sqrt(mean_squared_error(real_test_values, auto_arima_mean_forecast))))
MAE: 18.237783084684676 MSE: 551.7694512622186 RMSE: 23.489773333564088
By looking at the plot, the forecasted data showed an upward trend which is aligned with the test data. It correctly predicted that Gold Prices will go up from 2018 - 2021. It is also within the confidence interval, however the interval is really large. It's hard to predict the prices of gold but it's able to predict a general trend over time.
# For Prophet, we need to create a dataset that is a univariate time series that contains only Open Price. Drop all other columns.
pf_data = df.drop(['High', 'Low', 'Close', 'Adj Close', 'Volume'], axis=1)
#Prophet requires 2 columns with variable names in the time series to be:
#y – Target
#ds – Datetime
pf_data.rename(columns={'Open':'y'}, inplace=True)
pf_data['ds'] = pf_data.Date
#Create train and test data sets
pf_train_data = pf_data.loc[:'2017']
pf_test_data = pf_data.loc['2018':]
# Fitting the prophet model
#pf_model = Prophet(changepoint_prior_scale=0.1, daily_seasonality=True)
pf_model = Prophet(daily_seasonality=True)
pf_model.fit(pf_train_data)
#Create future prices & predict prices
pf_future_prices = pf_model.make_future_dataframe(periods=len(pf_test_data))
pf_forecast = pf_model.predict(pf_future_prices)
pf_forecast[-(len(pf_test_data)):]
| ds | trend | yhat_lower | yhat_upper | trend_lower | trend_upper | additive_terms | additive_terms_lower | additive_terms_upper | daily | ... | weekly | weekly_lower | weekly_upper | yearly | yearly_lower | yearly_upper | multiplicative_terms | multiplicative_terms_lower | multiplicative_terms_upper | yhat | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2018 | 2016-01-07 | 96.867406 | 96.535164 | 108.427963 | 96.867406 | 96.867406 | 5.727533 | 5.727533 | 5.727533 | 6.782889 | ... | 0.270896 | 0.270896 | 0.270896 | -1.326252 | -1.326252 | -1.326252 | 0.0 | 0.0 | 0.0 | 102.594939 |
| 2019 | 2016-01-08 | 96.832036 | 97.217720 | 108.411995 | 96.832036 | 96.832036 | 5.816700 | 5.816700 | 5.816700 | 6.782889 | ... | 0.189521 | 0.189521 | 0.189521 | -1.155710 | -1.155710 | -1.155710 | 0.0 | 0.0 | 0.0 | 102.648736 |
| 2020 | 2016-01-09 | 96.796665 | 96.218423 | 107.159322 | 96.796665 | 96.796665 | 4.954563 | 4.954563 | 4.954563 | 6.782889 | ... | -0.847861 | -0.847861 | -0.847861 | -0.980465 | -0.980465 | -0.980465 | 0.0 | 0.0 | 0.0 | 101.751228 |
| 2021 | 2016-01-10 | 96.761295 | 96.004748 | 107.618474 | 96.761295 | 96.761295 | 5.131685 | 5.131685 | 5.131685 | 6.782889 | ... | -0.847861 | -0.847861 | -0.847861 | -0.803342 | -0.803342 | -0.803342 | 0.0 | 0.0 | 0.0 | 101.892981 |
| 2022 | 2016-01-11 | 96.725925 | 97.483753 | 109.106435 | 96.725925 | 96.725925 | 6.621625 | 6.621625 | 6.621625 | 6.782889 | ... | 0.465838 | 0.465838 | 0.465838 | -0.627102 | -0.627102 | -0.627102 | 0.0 | 0.0 | 0.0 | 103.347550 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3409 | 2019-10-29 | 47.667235 | -121.279027 | 245.055165 | -126.019583 | 237.262877 | 7.607164 | 7.607164 | 7.607164 | 6.782889 | ... | 0.367362 | 0.367362 | 0.367362 | 0.456913 | 0.456913 | 0.456913 | 0.0 | 0.0 | 0.0 | 55.274399 |
| 3410 | 2019-10-30 | 47.631864 | -116.022740 | 245.353362 | -126.202422 | 237.353944 | 7.573755 | 7.573755 | 7.573755 | 6.782889 | ... | 0.402105 | 0.402105 | 0.402105 | 0.388761 | 0.388761 | 0.388761 | 0.0 | 0.0 | 0.0 | 55.205619 |
| 3411 | 2019-10-31 | 47.596494 | -119.348110 | 244.422361 | -126.374792 | 237.445010 | 7.378596 | 7.378596 | 7.378596 | 6.782889 | ... | 0.270896 | 0.270896 | 0.270896 | 0.324812 | 0.324812 | 0.324812 | 0.0 | 0.0 | 0.0 | 54.975090 |
| 3412 | 2019-11-01 | 47.561124 | -118.830447 | 245.691672 | -126.547162 | 237.614015 | 7.237945 | 7.237945 | 7.237945 | 6.782889 | ... | 0.189521 | 0.189521 | 0.189521 | 0.265535 | 0.265535 | 0.265535 | 0.0 | 0.0 | 0.0 | 54.799068 |
| 3413 | 2019-11-02 | 47.525753 | -121.318708 | 242.889992 | -126.719533 | 237.913565 | 6.146389 | 6.146389 | 6.146389 | 6.782889 | ... | -0.847861 | -0.847861 | -0.847861 | 0.211362 | 0.211362 | 0.211362 | 0.0 | 0.0 | 0.0 | 53.672142 |
1396 rows × 22 columns
# Plot the forecasted data set against the test data
plt.figure(figsize=(20,10))
# Plot the train data
sns.lineplot(x=pf_train_data.index, y='y', data=pf_train_data, linewidth=4, label='observed train data').set_title('Test vs Forecast From 2018 to 2021')
sns.lineplot(x=pf_test_data.index, y='y', data=pf_test_data, linewidth=1.5, label='Test data')
sns.lineplot(x=pf_test_data.index, y=pf_forecast['yhat'][-(len(pf_test_data)):], color='red', label='Predicted stock price')
plt.fill_between(pf_test_data.index, pf_forecast['yhat_lower'][-(len(pf_test_data)):], pf_forecast['yhat_upper'][-(len(pf_test_data)):], color='pink')
plt.show()
# Calculate MAE, MSE, RMSE
real_test_values = pf_test_data['y']
# Calculate MAE, MSE, RMSE
print('MAE: {}'.format(mean_absolute_error(real_test_values, pf_forecast['yhat'][-(len(pf_test_data)):])))
print('MSE: {}'.format(mean_squared_error(real_test_values, pf_forecast['yhat'][-(len(pf_test_data)):])))
print('RMSE: {}'.format(np.sqrt(mean_squared_error(real_test_values, pf_forecast['yhat'][-(len(pf_test_data)):]))))
MAE: 56.366238521925276 MSE: 4398.074128279379 RMSE: 66.31797741396656
fig1 = pf_model.plot(pf_forecast)
fig2 = pf_model.plot_components(pf_forecast)
from prophet.plot import plot_plotly
import plotly.offline as py
py.init_notebook_mode()
plotly_fig = plot_plotly(pf_model, pf_forecast) # this returns a plotly figure
py.iplot(plotly_fig)
By looking at the above plots produced by Prophet, the forecasted data shows a flat line. The forecasted data is not aligned with the test data when the test data shows an upward trend. Prophet does not seem to be as accurate as ARIMA model. The MAE, MSE and RMSE of Prophet is also higher than the results of ARIMA model.
from prophet import Prophet
!pip install git+https://github.com/AutoViML/Auto_TS.git --user
Collecting git+https://github.com/AutoViML/Auto_TS.git Cloning https://github.com/AutoViML/Auto_TS.git to /private/var/folders/zg/hmfj1k517rvbwxt7zymkfnm80000gn/T/pip-req-build-ujac4j7u Running command git clone -q https://github.com/AutoViML/Auto_TS.git /private/var/folders/zg/hmfj1k517rvbwxt7zymkfnm80000gn/T/pip-req-build-ujac4j7u Requirement already satisfied: ipython in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from auto-ts==0.0.37) (7.22.0) Requirement already satisfied: jupyter in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from auto-ts==0.0.37) (1.0.0) Requirement already satisfied: pmdarima in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from auto-ts==0.0.37) (1.8.2) Requirement already satisfied: pandas in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from auto-ts==0.0.37) (1.2.4) Requirement already satisfied: matplotlib in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from auto-ts==0.0.37) (3.4.2) Requirement already satisfied: seaborn in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from auto-ts==0.0.37) (0.11.1) Requirement already satisfied: scikit-learn>=0.24.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from auto-ts==0.0.37) (0.24.2) Requirement already satisfied: fbprophet in /Users/himamohandas/.local/lib/python3.8/site-packages (from auto-ts==0.0.37) (0.7.1) Requirement already satisfied: statsmodels in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from auto-ts==0.0.37) (0.12.2) Requirement already satisfied: xgboost in /Users/himamohandas/.local/lib/python3.8/site-packages (from auto-ts==0.0.37) (1.4.2) Requirement already satisfied: prettytable in /Users/himamohandas/.local/lib/python3.8/site-packages (from auto-ts==0.0.37) (2.1.0) Requirement already satisfied: dask>=2.30.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from auto-ts==0.0.37) (2021.4.0) Requirement already satisfied: cloudpickle>=1.1.1 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from dask>=2.30.0->auto-ts==0.0.37) (1.6.0) Requirement already satisfied: pyyaml in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from dask>=2.30.0->auto-ts==0.0.37) (5.4.1) Requirement already satisfied: fsspec>=0.6.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from dask>=2.30.0->auto-ts==0.0.37) (0.9.0) Requirement already satisfied: toolz>=0.8.2 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from dask>=2.30.0->auto-ts==0.0.37) (0.11.1) Requirement already satisfied: partd>=0.3.10 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from dask>=2.30.0->auto-ts==0.0.37) (1.2.0) Requirement already satisfied: locket in /Users/himamohandas/anaconda3/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2.30.0->auto-ts==0.0.37) (0.2.1) Requirement already satisfied: scipy>=0.19.1 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from scikit-learn>=0.24.0->auto-ts==0.0.37) (1.7.0) Requirement already satisfied: numpy>=1.13.3 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from scikit-learn>=0.24.0->auto-ts==0.0.37) (1.19.5) Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from scikit-learn>=0.24.0->auto-ts==0.0.37) (2.1.0) Requirement already satisfied: joblib>=0.11 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from scikit-learn>=0.24.0->auto-ts==0.0.37) (1.0.1) Requirement already satisfied: Cython>=0.22 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from fbprophet->auto-ts==0.0.37) (0.29.23) Requirement already satisfied: holidays>=0.10.2 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from fbprophet->auto-ts==0.0.37) (0.11.2) Requirement already satisfied: pystan>=2.14 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from fbprophet->auto-ts==0.0.37) (2.19.1.1) Requirement already satisfied: LunarCalendar>=0.0.9 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from fbprophet->auto-ts==0.0.37) (0.0.9) Requirement already satisfied: tqdm>=4.36.1 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from fbprophet->auto-ts==0.0.37) (4.59.0) Requirement already satisfied: cmdstanpy==0.9.5 in /Users/himamohandas/.local/lib/python3.8/site-packages (from fbprophet->auto-ts==0.0.37) (0.9.5) Requirement already satisfied: convertdate>=2.1.2 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from fbprophet->auto-ts==0.0.37) (2.3.2) Requirement already satisfied: python-dateutil>=2.8.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from fbprophet->auto-ts==0.0.37) (2.8.1) Requirement already satisfied: setuptools-git>=1.2 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from fbprophet->auto-ts==0.0.37) (1.2) Requirement already satisfied: pymeeus<=1,>=0.3.13 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from convertdate>=2.1.2->fbprophet->auto-ts==0.0.37) (0.5.11) Requirement already satisfied: pytz>=2014.10 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from convertdate>=2.1.2->fbprophet->auto-ts==0.0.37) (2021.1) Requirement already satisfied: korean-lunar-calendar in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from holidays>=0.10.2->fbprophet->auto-ts==0.0.37) (0.2.1) Requirement already satisfied: hijri-converter in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from holidays>=0.10.2->fbprophet->auto-ts==0.0.37) (2.1.3) Requirement already satisfied: six in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from holidays>=0.10.2->fbprophet->auto-ts==0.0.37) (1.15.0) Requirement already satisfied: ephem>=3.7.5.3 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from LunarCalendar>=0.0.9->fbprophet->auto-ts==0.0.37) (4.0.0.2) Requirement already satisfied: cycler>=0.10 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from matplotlib->auto-ts==0.0.37) (0.10.0) Requirement already satisfied: pillow>=6.2.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from matplotlib->auto-ts==0.0.37) (8.2.0) Requirement already satisfied: pyparsing>=2.2.1 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from matplotlib->auto-ts==0.0.37) (2.4.7) Requirement already satisfied: kiwisolver>=1.0.1 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from matplotlib->auto-ts==0.0.37) (1.3.1) Requirement already satisfied: setuptools>=18.5 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipython->auto-ts==0.0.37) (52.0.0.post20210125) Requirement already satisfied: pexpect>4.3 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipython->auto-ts==0.0.37) (4.8.0) Requirement already satisfied: backcall in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipython->auto-ts==0.0.37) (0.2.0) Requirement already satisfied: pickleshare in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipython->auto-ts==0.0.37) (0.7.5) Requirement already satisfied: appnope in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipython->auto-ts==0.0.37) (0.1.2) Requirement already satisfied: jedi>=0.16 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipython->auto-ts==0.0.37) (0.17.2) Requirement already satisfied: traitlets>=4.2 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipython->auto-ts==0.0.37) (5.0.5) Requirement already satisfied: decorator in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipython->auto-ts==0.0.37) (5.0.6) Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipython->auto-ts==0.0.37) (3.0.17) Requirement already satisfied: pygments in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipython->auto-ts==0.0.37) (2.8.1) Requirement already satisfied: parso<0.8.0,>=0.7.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from jedi>=0.16->ipython->auto-ts==0.0.37) (0.7.0) Requirement already satisfied: ptyprocess>=0.5 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from pexpect>4.3->ipython->auto-ts==0.0.37) (0.7.0) Requirement already satisfied: wcwidth in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->auto-ts==0.0.37) (0.2.5) Requirement already satisfied: ipython-genutils in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from traitlets>=4.2->ipython->auto-ts==0.0.37) (0.2.0) Requirement already satisfied: notebook in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from jupyter->auto-ts==0.0.37) (6.3.0) Requirement already satisfied: ipykernel in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from jupyter->auto-ts==0.0.37) (5.3.4) Requirement already satisfied: qtconsole in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from jupyter->auto-ts==0.0.37) (5.0.3) Requirement already satisfied: nbconvert in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from jupyter->auto-ts==0.0.37) (6.0.7) Requirement already satisfied: ipywidgets in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from jupyter->auto-ts==0.0.37) (7.6.3) Requirement already satisfied: jupyter-console in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from jupyter->auto-ts==0.0.37) (6.4.0) Requirement already satisfied: jupyter-client in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipykernel->jupyter->auto-ts==0.0.37) (6.1.12) Requirement already satisfied: tornado>=4.2 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipykernel->jupyter->auto-ts==0.0.37) (6.1) Requirement already satisfied: widgetsnbextension~=3.5.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipywidgets->jupyter->auto-ts==0.0.37) (3.5.1) Requirement already satisfied: nbformat>=4.2.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipywidgets->jupyter->auto-ts==0.0.37) (5.1.3) Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from ipywidgets->jupyter->auto-ts==0.0.37) (1.0.0) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets->jupyter->auto-ts==0.0.37) (3.2.0) Requirement already satisfied: jupyter-core in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets->jupyter->auto-ts==0.0.37) (4.7.1) Requirement already satisfied: attrs>=17.4.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->jupyter->auto-ts==0.0.37) (20.3.0) Requirement already satisfied: pyrsistent>=0.14.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->jupyter->auto-ts==0.0.37) (0.17.3) Requirement already satisfied: pyzmq>=17 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from notebook->jupyter->auto-ts==0.0.37) (20.0.0) Requirement already satisfied: terminado>=0.8.3 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from notebook->jupyter->auto-ts==0.0.37) (0.9.4) Requirement already satisfied: prometheus-client in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from notebook->jupyter->auto-ts==0.0.37) (0.10.1) Requirement already satisfied: jinja2 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from notebook->jupyter->auto-ts==0.0.37) (2.11.3) Requirement already satisfied: argon2-cffi in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from notebook->jupyter->auto-ts==0.0.37) (20.1.0) Requirement already satisfied: Send2Trash>=1.5.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from notebook->jupyter->auto-ts==0.0.37) (1.5.0) Requirement already satisfied: cffi>=1.0.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from argon2-cffi->notebook->jupyter->auto-ts==0.0.37) (1.14.5) Requirement already satisfied: pycparser in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->notebook->jupyter->auto-ts==0.0.37) (2.20) Requirement already satisfied: MarkupSafe>=0.23 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from jinja2->notebook->jupyter->auto-ts==0.0.37) (1.1.1) Requirement already satisfied: mistune<2,>=0.8.1 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbconvert->jupyter->auto-ts==0.0.37) (0.8.4) Requirement already satisfied: pandocfilters>=1.4.1 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbconvert->jupyter->auto-ts==0.0.37) (1.4.3) Requirement already satisfied: jupyterlab-pygments in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbconvert->jupyter->auto-ts==0.0.37) (0.1.2) Requirement already satisfied: bleach in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbconvert->jupyter->auto-ts==0.0.37) (3.3.0) Requirement already satisfied: testpath in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbconvert->jupyter->auto-ts==0.0.37) (0.4.4) Requirement already satisfied: entrypoints>=0.2.2 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbconvert->jupyter->auto-ts==0.0.37) (0.3) Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbconvert->jupyter->auto-ts==0.0.37) (0.5.3) Requirement already satisfied: defusedxml in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbconvert->jupyter->auto-ts==0.0.37) (0.7.1) Requirement already satisfied: nest-asyncio in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->jupyter->auto-ts==0.0.37) (1.5.1) Requirement already satisfied: async-generator in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->jupyter->auto-ts==0.0.37) (1.10) Requirement already satisfied: packaging in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from bleach->nbconvert->jupyter->auto-ts==0.0.37) (20.9) Requirement already satisfied: webencodings in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from bleach->nbconvert->jupyter->auto-ts==0.0.37) (0.5.1) Requirement already satisfied: urllib3 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from pmdarima->auto-ts==0.0.37) (1.26.4) Requirement already satisfied: patsy>=0.5 in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from statsmodels->auto-ts==0.0.37) (0.5.1) Requirement already satisfied: qtpy in /Users/himamohandas/anaconda3/lib/python3.8/site-packages (from qtconsole->jupyter->auto-ts==0.0.37) (1.9.0)
from auto_ts import auto_timeseries
Imported auto_timeseries version:0.0.36. Call by using:
model = auto_timeseries(score_type='rmse',
time_interval='M',
non_seasonal_pdq=None, seasonality=False, seasonal_period=12,
model_type=['best'],
verbose=2)
model.fit(traindata, ts_column,target)
model.predict(testdata, model='best')
# Read data
START = '2008-01-01'
TODAYS_DATE = date.today()
df1 = yf.download('GLD', START, TODAYS_DATE)
df2 = yf.download('SPY', START, TODAYS_DATE)
df3 = yf.download('CL', START, TODAYS_DATE)
df4 = yf.download('SLV', START, TODAYS_DATE)
print (df1.shape)
print (df2.shape)
print (df3.shape)
print (df4.shape)
[*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed (3414, 6) (3414, 6) (3414, 6) (3414, 6)
df1.columns = ['Open_GLD', 'High_GLD', 'Low_GLD', 'Close_GLD', 'Adj Close_GLD', 'Volume_GLD']
df2.columns = ['Open_SPX', 'High_SPX', 'Low_SPX', 'Close_SPX', 'Adj Close_SPX', 'Volume_SPX']
df3.columns = ['Open_OIL', 'High_OIL', 'Low_OIL', 'Close_OIL', 'Adj Close_OIL', 'Volume_OIL']
df4.columns = ['Open_SLV', 'High_SLV', 'Low_SLV', 'Close_SLV', 'Adj Close_SLV', 'Volume_SLV']
merged_df1 = pd.merge(df1, df2, on="Date")
merged_df2 = pd.merge(merged_df1, df3, on="Date")
df_mv = pd.merge(merged_df2, df4, on="Date")
df_mv.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3414 entries, 2008-01-02 to 2021-07-23 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Open_GLD 3414 non-null float64 1 High_GLD 3414 non-null float64 2 Low_GLD 3414 non-null float64 3 Close_GLD 3414 non-null float64 4 Adj Close_GLD 3414 non-null float64 5 Volume_GLD 3414 non-null int64 6 Open_SPX 3414 non-null float64 7 High_SPX 3414 non-null float64 8 Low_SPX 3414 non-null float64 9 Close_SPX 3414 non-null float64 10 Adj Close_SPX 3414 non-null float64 11 Volume_SPX 3414 non-null int64 12 Open_OIL 3414 non-null float64 13 High_OIL 3414 non-null float64 14 Low_OIL 3414 non-null float64 15 Close_OIL 3414 non-null float64 16 Adj Close_OIL 3414 non-null float64 17 Volume_OIL 3414 non-null int64 18 Open_SLV 3414 non-null float64 19 High_SLV 3414 non-null float64 20 Low_SLV 3414 non-null float64 21 Close_SLV 3414 non-null float64 22 Adj Close_SLV 3414 non-null float64 23 Volume_SLV 3414 non-null int64 dtypes: float64(20), int64(4) memory usage: 666.8 KB
df_mv.shape
(3414, 24)
df_mv.describe()
| Open_GLD | High_GLD | Low_GLD | Close_GLD | Adj Close_GLD | Volume_GLD | Open_SPX | High_SPX | Low_SPX | Close_SPX | ... | Low_OIL | Close_OIL | Adj Close_OIL | Volume_OIL | Open_SLV | High_SLV | Low_SLV | Close_SLV | Adj Close_SLV | Volume_SLV | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 3414.000000 | 3414.000000 | 3414.000000 | 3414.000000 | 3414.000000 | 3.414000e+03 | 3414.000000 | 3414.000000 | 3414.000000 | 3414.000000 | ... | 3414.000000 | 3414.000000 | 3414.000000 | 3.414000e+03 | 3414.000000 | 3414.000000 | 3414.000000 | 3414.000000 | 3414.000000 | 3.414000e+03 |
| mean | 128.154359 | 128.783975 | 127.470006 | 128.146406 | 128.146406 | 1.099452e+07 | 201.099452 | 202.223436 | 199.868248 | 201.125747 | ... | 58.197853 | 58.642025 | 51.232492 | 4.387527e+06 | 19.560388 | 19.737974 | 19.362859 | 19.556081 | 19.556081 | 1.490498e+07 |
| std | 25.430152 | 25.483592 | 25.317432 | 25.420547 | 25.420547 | 7.173965e+06 | 82.201470 | 82.479403 | 81.888727 | 82.221224 | ... | 14.930987 | 14.987761 | 16.860934 | 2.354142e+06 | 6.618265 | 6.704136 | 6.504675 | 6.615612 | 6.615612 | 1.822469e+07 |
| min | 69.300003 | 71.889999 | 66.000000 | 70.000000 | 70.000000 | 1.501600e+06 | 67.949997 | 70.000000 | 67.099998 | 68.110001 | ... | 27.180000 | 27.385000 | 20.262321 | 7.332000e+05 | 8.710000 | 9.050000 | 8.450000 | 8.850000 | 8.850000 | 1.731500e+06 |
| 25% | 113.855000 | 114.335001 | 113.395000 | 113.814999 | 113.814999 | 6.531100e+06 | 131.710007 | 132.639999 | 130.932495 | 131.772495 | ... | 42.232500 | 42.551250 | 33.482352 | 2.875150e+06 | 15.280000 | 15.390000 | 15.170000 | 15.286250 | 15.286250 | 6.371000e+06 |
| 50% | 122.919998 | 123.330002 | 122.500000 | 122.860001 | 122.860001 | 9.131250e+06 | 195.479996 | 196.550003 | 194.510002 | 195.655006 | ... | 64.000000 | 64.510002 | 56.324842 | 3.809150e+06 | 16.914001 | 17.039500 | 16.765000 | 16.889999 | 16.889999 | 9.408400e+06 |
| 75% | 146.945004 | 147.580002 | 146.167500 | 146.777496 | 146.777496 | 1.322372e+07 | 263.812492 | 265.610008 | 261.757507 | 263.737511 | ... | 69.919998 | 70.527498 | 64.700947 | 5.219475e+06 | 22.690001 | 22.895000 | 22.469999 | 22.697501 | 22.697501 | 1.673290e+07 |
| max | 193.740005 | 194.449997 | 192.520004 | 193.889999 | 193.889999 | 9.380420e+07 | 437.519989 | 440.299988 | 436.790009 | 439.940002 | ... | 85.379997 | 86.260002 | 84.843834 | 3.293900e+07 | 47.619999 | 48.349998 | 46.549999 | 47.259998 | 47.259998 | 2.954000e+08 |
8 rows × 24 columns
for column in df_mv.columns:
print(column,df_mv[column].nunique())
Open_GLD 2717 High_GLD 2719 Low_GLD 2718 Close_GLD 2706 Adj Close_GLD 2706 Volume_GLD 3376 Open_SPX 3167 High_SPX 3141 Low_SPX 3135 Close_SPX 3148 Adj Close_SPX 3329 Volume_SPX 3414 Open_OIL 2423 High_OIL 2430 Low_OIL 2436 Close_OIL 2448 Adj Close_OIL 3208 Volume_OIL 3296 Open_SLV 1694 High_SLV 1686 Low_SLV 1644 Close_SLV 1639 Adj Close_SLV 1639 Volume_SLV 3382
df_mv.head()
| Open_GLD | High_GLD | Low_GLD | Close_GLD | Adj Close_GLD | Volume_GLD | Open_SPX | High_SPX | Low_SPX | Close_SPX | ... | Low_OIL | Close_OIL | Adj Close_OIL | Volume_OIL | Open_SLV | High_SLV | Low_SLV | Close_SLV | Adj Close_SLV | Volume_SLV | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2008-01-02 | 83.559998 | 85.139999 | 83.440002 | 84.860001 | 84.860001 | 12291100 | 146.529999 | 146.990005 | 143.880005 | 144.929993 | ... | 38.430000 | 38.669998 | 27.983322 | 4151800 | 14.856 | 15.188 | 14.856 | 15.180 | 15.180 | 6721000 |
| 2008-01-03 | 84.870003 | 85.940002 | 84.599998 | 85.570000 | 85.570000 | 9553900 | 144.910004 | 145.490005 | 144.070007 | 144.860001 | ... | 38.669998 | 38.945000 | 28.182341 | 3275800 | 15.099 | 15.300 | 15.015 | 15.285 | 15.285 | 5530000 |
| 2008-01-04 | 85.339996 | 85.550003 | 84.430000 | 85.129997 | 85.129997 | 8402200 | 143.339996 | 143.440002 | 140.910004 | 141.309998 | ... | 38.695000 | 39.389999 | 28.504347 | 6706200 | 15.251 | 15.265 | 15.014 | 15.167 | 15.167 | 4055000 |
| 2008-01-07 | 85.239998 | 85.260002 | 84.570000 | 84.769997 | 84.769997 | 6944300 | 141.809998 | 142.229996 | 140.100006 | 141.190002 | ... | 39.605000 | 39.985001 | 28.934919 | 6082200 | 15.161 | 15.234 | 14.967 | 15.053 | 15.053 | 5132000 |
| 2008-01-08 | 86.279999 | 87.129997 | 86.160004 | 86.779999 | 86.779999 | 9567900 | 142.080002 | 142.899994 | 138.440002 | 138.910004 | ... | 39.630001 | 39.990002 | 28.938541 | 8614400 | 15.320 | 15.627 | 15.320 | 15.590 | 15.590 | 6153000 |
5 rows × 24 columns
## Checking for correlation
corr_mat=df_mv.corr()
fig = plt.figure(figsize=(15,7))
sns.heatmap(corr_mat,annot=True)
plt.show()
df_mv = df_mv.drop (columns=['Close_GLD', 'High_GLD', 'Low_GLD', 'Adj Close_GLD',
'Volume_GLD', 'Open_SPX', 'High_SPX', 'Low_SPX', 'Close_SPX',
'Adj Close_SPX', 'Volume_SPX', 'Open_OIL', 'High_OIL', 'Low_OIL',
'Close_OIL', 'Adj Close_OIL', 'Volume_OIL', 'Close_SLV', 'High_SLV',
'Low_SLV', 'Adj Close_SLV', 'Volume_SLV'])
df_mv
| Open_GLD | Open_SLV | |
|---|---|---|
| Date | ||
| 2008-01-02 | 83.559998 | 14.856000 |
| 2008-01-03 | 84.870003 | 15.099000 |
| 2008-01-04 | 85.339996 | 15.251000 |
| 2008-01-07 | 85.239998 | 15.161000 |
| 2008-01-08 | 86.279999 | 15.320000 |
| ... | ... | ... |
| 2021-07-19 | 169.509995 | 23.469999 |
| 2021-07-20 | 170.509995 | 23.240000 |
| 2021-07-21 | 168.330002 | 23.190001 |
| 2021-07-22 | 168.490005 | 23.320000 |
| 2021-07-23 | 168.500000 | 23.350000 |
3414 rows × 2 columns
# checking the distribution of the Close Price
sns.distplot(df_mv['Open_GLD'],color='green')
<AxesSubplot:xlabel='Open_GLD', ylabel='Density'>
sns.distplot(df_mv['Open_SLV'],color='blue')
<AxesSubplot:xlabel='Open_SLV', ylabel='Density'>
ts_column = 'Date'
target = 'Open_GLD'
sep = ','
FORECAST_PERIOD = 1095
train = df_mv[:-FORECAST_PERIOD]
test = df_mv[-FORECAST_PERIOD:]
print(train.shape, test.shape)
(2319, 2) (1095, 2)
model = auto_timeseries(
score_type='rmse',
model_type='best', verbose=2
)
model.fit(
traindata=train,
ts_column=ts_column,
target=target,
cv=3,
sep=sep)
Start of Fit.....
Running Augmented Dickey-Fuller test with paramters:
maxlag: 31 regression: c autolag: BIC
Results of Augmented Dickey-Fuller Test:
+-----------------------------+------------------------------+
| | Dickey-Fuller Augmented Test |
+-----------------------------+------------------------------+
| Test Statistic | -1.853601090633292 |
| p-value | 0.3541932877214006 |
| #Lags Used | 0.0 |
| Number of Observations Used | 2318.0 |
| Critical Value (1%) | -3.433174226216983 |
| Critical Value (5%) | -2.862787685084787 |
| Critical Value (10%) | -2.5674341983695146 |
+-----------------------------+------------------------------+
this series is non-stationary. Trying test again after differencing...
After differencing=1, results of Augmented Dickey-Fuller Test:
+-----------------------------+------------------------------+
| | Dickey-Fuller Augmented Test |
+-----------------------------+------------------------------+
| Test Statistic | -48.05293518099748 |
| p-value | 0.0 |
| #Lags Used | 0.0 |
| Number of Observations Used | 2317.0 |
| Critical Value (1%) | -3.433175446486468 |
| Critical Value (5%) | -2.8627882239194244 |
| Critical Value (10%) | -2.5674344852583286 |
+-----------------------------+------------------------------+
this series is stationary
Target variable given as = Open_GLD
Start of loading of data.....
Input is data frame. Performing Time Series Analysis
ts_column: Date sep: , target: Open_GLD
Loaded pandas dataframe...
pandas Dataframe loaded successfully. Shape of data set = (2319, 2)
chart frequency not known. Continuing...
Time Interval between observations has not been provided. Auto_TS will try to infer this now...
Time series input in days = 1
It is a Daily time series.
WARNING: Running best models will take time... Be Patient...
==================================================
Building Prophet Model
==================================================
Running Facebook Prophet Model...
Fit-Predict data (shape=(2319, 3)) with Confidence Interval = 0.95...
Starting Prophet Fit
No seasonality assumed since seasonality flag is set to False
Starting Prophet Cross Validation
Max. iterations using expanding window cross validation = 3
Fold Number: 1 --> Train Shape: 2304 Test Shape: 5
Root Mean Squared Error predictions vs actuals = 3.63
Std Deviation of actuals = 1.11
Normalized RMSE = 327%
Cross Validation window: 1 completed
Fold Number: 2 --> Train Shape: 2309 Test Shape: 5
Root Mean Squared Error predictions vs actuals = 5.59
Std Deviation of actuals = 1.07
Normalized RMSE = 524%
Cross Validation window: 2 completed
Fold Number: 3 --> Train Shape: 2314 Test Shape: 5
Root Mean Squared Error predictions vs actuals = 3.65
Std Deviation of actuals = 1.31
Normalized RMSE = 278%
Cross Validation window: 3 completed
-------------------------------------------
Model Cross Validation Results:
-------------------------------------------
MAE (as % Std Dev of Actuals) = 208.02%
MAPE (Mean Absolute Percent Error) = 3%
RMSE (Root Mean Squared Error) = 4.3882
Normalized RMSE (MinMax) = 81%
Normalized RMSE (as Std Dev of Actuals)= 237%
Time Taken = 17 seconds
End of Prophet Fit
==================================================
Building Auto SARIMAX Model
==================================================
Running Auto SARIMAX Model...
Best Parameters:
p: None, d: None, q: None
P: None, D: None, Q: None
Seasonality: False
Seasonal Period: 12
Fold Number: 1 --> Train Shape: 2304 Test Shape: 5
Finding the best parameters using AutoArima:
Using smaller parameters for larger dataset with greater than 1000 samples
Best model is a Seasonal SARIMAX(2,1,0)*(0,0,0,12), aic = 6355.761
Static Forecasts:
RMSE = 1.32
Std Deviation of Actuals = 1.11
Normalized RMSE = 118.7%
Fold Number: 2 --> Train Shape: 2309 Test Shape: 5
Finding the best parameters using AutoArima:
Using smaller parameters for larger dataset with greater than 1000 samples
Best model is a Seasonal SARIMAX(2,1,0)*(0,0,0,12), aic = 6367.289
Static Forecasts:
RMSE = 0.83
Std Deviation of Actuals = 1.07
Normalized RMSE = 77.8%
Fold Number: 3 --> Train Shape: 2314 Test Shape: 5
Finding the best parameters using AutoArima:
Using smaller parameters for larger dataset with greater than 1000 samples
Best model is a Seasonal SARIMAX(2,1,0)*(0,0,0,12), aic = 6376.858
Static Forecasts:
RMSE = 1.23
Std Deviation of Actuals = 1.31
Normalized RMSE = 93.7%
SARIMAX RMSE (all folds): 1.1267
SARIMAX Norm RMSE (all folds): 5%
-------------------------------------------
Model Cross Validation Results:
-------------------------------------------
MAE (as % Std Dev of Actuals) = 47.37%
MAPE (Mean Absolute Percent Error) = 1%
RMSE (Root Mean Squared Error) = 1.1466
Normalized RMSE (MinMax) = 21%
Normalized RMSE (as Std Dev of Actuals)= 62%
Finding the best parameters using AutoArima:
Using smaller parameters for larger dataset with greater than 1000 samples
Best model is a Seasonal SARIMAX(2,1,0)*(0,0,0,12), aic = 6390.667
Refitting data with previously found best parameters
Best aic metric = 6389.5
SARIMAX Results
==============================================================================
Dep. Variable: Open_GLD No. Observations: 2319
Model: SARIMAX(2, 1, 0) Log Likelihood -3188.755
Date: Mon, 26 Jul 2021 AIC 6389.509
Time: 21:42:51 BIC 6423.995
Sample: 0 HQIC 6402.078
- 2319
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept 0.0416 0.041 1.028 0.304 -0.038 0.121
drift -2.506e-05 3.42e-05 -0.733 0.464 -9.21e-05 4.2e-05
Open_SLV 2.1725 0.015 144.513 0.000 2.143 2.202
ar.L1 0.0540 0.013 4.224 0.000 0.029 0.079
ar.L2 -0.0566 0.011 -5.182 0.000 -0.078 -0.035
sigma2 0.9192 0.012 78.544 0.000 0.896 0.942
===================================================================================
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 11068.43
Prob(Q): 0.98 Prob(JB): 0.00
Heteroskedasticity (H): 0.70 Skew: 0.53
Prob(H) (two-sided): 0.00 Kurtosis: 13.66
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
===============================================
Skipping VAR Model since dataset is > 1000 rows and it will take too long
===============================================
==================================================
Building ML Model
==================================================
Running Machine Learning Models...
Shifting 1 predictors by lag=4 to align prior predictor with current target...
############## C L A S S I F Y I N G V A R I A B L E S ####################
Classifying variables in data set...
1 Predictors classified...
No variables removed since no ID or low-information variables found in data set
Fitting ML model
19 variables used in training ML model = ['Open_SLV(t)', 'Open_SLV(t-4)', 'Open_GLD(t-4)', 'Open_SLV(t-3)', 'Open_GLD(t-3)', 'Open_SLV(t-2)', 'Open_GLD(t-2)', 'Open_SLV(t-1)', 'Open_GLD(t-1)', 'Date_hour', 'Date_minute', 'Date_dayofweek', 'Date_quarter', 'Date_month', 'Date_year', 'Date_dayofyear', 'Date_dayofmonth', 'Date_weekofyear', 'Date_weekend']
Running Cross Validation using XGBoost model..
Max. iterations using expanding window cross validation = 3
Fold Number: 1 --> Train Shape: 2300 Test Shape: 5
Exception occurred while building ML model...
dlsym(0x7fb0af390f90, XGBoosterGetStrFeatureInfo): symbol not found
For ML model, evaluation score is not available.
Best Model is: auto_SARIMAX
Best Model (Mean CV) Score: 1.13
--------------------------------------------------
Total time taken: 34 seconds.
--------------------------------------------------
Leaderboard with best model on top of list:
name rmse
1 auto_SARIMAX 1.126715
0 Prophet 4.291096
2 ML inf
<auto_ts.auto_timeseries at 0x7fb0a007abe0>
model.plot_cv_scores()
<AxesSubplot:xlabel='Model', ylabel='CV Scores'>
pred = model.predict(testdata=test,model='auto_SARIMAX',simple=False)
pred
| Open_GLD | yhat | mean_se | mean_ci_lower | mean_ci_upper |
|---|---|---|---|---|
| 2319 | 117.067809 | 0.958773 | 115.188649 | 118.946968 |
| 2320 | 117.292946 | 1.393009 | 114.562699 | 120.023193 |
| 2321 | 117.305648 | 1.691262 | 113.990836 | 120.620461 |
| 2322 | 117.550084 | 1.941480 | 113.744854 | 121.355314 |
| 2323 | 117.489678 | 2.164091 | 113.248139 | 121.731218 |
| ... | ... | ... | ... | ... |
| 3409 | 99.452329 | 31.589292 | 37.538455 | 161.366204 |
| 3410 | 98.908929 | 31.603764 | 36.966689 | 160.851169 |
| 3411 | 98.756560 | 31.618230 | 36.785968 | 160.727152 |
| 3412 | 98.995217 | 31.632689 | 36.996286 | 160.994149 |
| 3413 | 99.016600 | 31.647141 | 36.989343 | 161.043858 |
1095 rows × 4 columns
The AutoTS model also selected "auto_SARIMAX" with differencing one as the best one. Hence, we decided to proceed with this method.
Gold is a huge financial asset for countries and central banks. It is used by banks as a way to hedge against loans made to their govt and as an indicator of economic health. It can be viewed like a currency. People are physically and emotionally attached to gold in many countries. It has always been a go to investment. This app is designed to be used for predicting gold prices to help with making decisions regarding buying, selling or holding onto the commodity. https://predict-gold-price.herokuapp.com/
Disclaimer: Please review this disclaimer carefully and meticulously before utilizing the model hosted/operated by Group 5. The content and material displayed is the intellectual property of Group 5. The information, content and/or principles may not be reused, republished, or reprinted without the formal consent of all members. The intended application for such a model is for informational/educational purposes and is not intended to be used as a substitute for insight/feedback/advice from professionals and/or 3rd party. Use on your own discretion. Although the data presented has undertaken procedures to ensure its completeness, the authors cannot guarantee that errors, mistakes or misinformed information is present.